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Abstract 



The present paper discusses the problem of least-squares over the real 
' symplectic group of matrices Sp(2n, R). The least-squares problem may 

\ be extended from fiat spaces to curved spaces by the notion of geodesic 

\ • distance. The resulting non-linear minimization problem on manifold may 

' be tackled by means of a gradient-descent algorithm tailored to the geom- 

O . etry of the space at hand. In turn, gradient steepest descent on manifold 

may be implemented through a geodesic-based stepping method. As the 
space Sp(2n, R) is a non-compact Lie group, it is convenient to endow 
^ , it with a pseudo-Riemannian geometry. Indeed, a pseudo-Riemannian 

■ metric allows the computation of geodesic arcs and geodesic distances in 

| closed form on Sp(2n, R). 

00 ' 
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1 Introduction 

Least-squares problems over compact Lie groups have been extensively studied 
. due to their broad application range. A feature of these problems is that the 

domain of the objective function to optimize, being a compact Lie group, may 
be endowed with the structure of a differential manifold with a bi-invariant 
Riemannian metric. The formulation of least-squares problems on compact 
Riemannian Lie groups relies on closed forms of geodesic curves and geodesic 
distances. 

On the Euclidean matrix space K™ xm , a (weighted) least-squares problem 
may be formulated via the criterion function / : R n x m — »■ : 

/(^'^C^-TfeH 2 , (1) 

k 

where Tfc S M. nxm denote optimization target matrices, at > denote weights 
and symbol || • || denotes the Frobenius matrix norm. On a Riemannian curved 
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manifold M, the above least-squares problem may be reformulated via the gen- 
eralized criterion function / : M — > R^~ : 



where € M and the function d(-, •) denotes a geodesic distance on the mani- 
fold M corresponding to the metric that the manifold is endowed with. 

On non-compact Riemannian Lie groups, the problem of formulating a least- 
squares criterion and of its optimization is substantially more involved, because 
it might be hard to compute geodesic distances in closed form. However, a 
non-compact Lie group may be treated as a pseudo-Riemannian manifold with 
a bi-invariant pseudo-Riemannian metric [20]. 

One of such non-compact Lie groups is the real symplectic group. 

Real symplectic matrices form an algebraic group under standard matrix 
multiplication and inversion, denoted as Sp(2n,R). The least-squares problem 
over the set of real symplectic matrices plays an important role in applied fields. 
Some applications are: 

• Quantum computing [3J \W\ : An important application of quantum me- 
chanics is quantum computing, as quantum mechanics allows for informa- 
tion processing that can not be performed classically. In particular, it may 
be possible to design algorithms on a quantum computers that are more 
efficient than on a classical computers. 

• Control of beam systems in particle accelerators 8 : Lie-group tools may 
be applied to the characterization of beam dynamics in charged-particle 
optical systems. These methods are applicable to accelerator design, 
charge-particle beam transport and electron microscopes. Lie-group meth- 
ods potentially provide a way to analyze and control non-linear behavior 
with the same completeness of linear behavior. 

• Computational ophthalmology [171 118) : In the study of optical systems in 
ophthalmology, it is assumed that the optical nature of a centered optical 
system is completely described by a real symplectic matrix. 

• Vibration analysis |23j : Transfer matrices are widely used for the dynamic 
analysis of engineering structures as well as for static analysis, and are par- 
ticularly useful in the treatment of repetitive structures. Transfer matrices 
are real symplectic. 

• Control theory 10 : Real symplectic matrices find applications in linear 
control theory for discrete-time systems. 

Other applications may be found in coding theory [3] and in time-series predic- 
tion 0. 

Although some noticeable results are available about the real symplectic 
group [HI [TU] , optimization on the real symplectic group appears to be far less 
studied than for other Lie groups. In the present manuscript, we discuss the 
problem of least-squares on the real symplectic group. After a review of results 
known from literature about optimization on the real symplectic manifold, an 
optimization method based on endowing it with a pseudo-Riemannian geometry 
will be discussed. 
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2 Criterion optimization on Riemannian and on 
pseudo-Riemannian manifolds 

For a reference on differential geometry, see [23] . 

Let M be a smooth manifold. The tangent space at x £ M to the manifold 
is denoted by T X M. A metric on M is a non-degenerate, smooth, symmetric, 
bilinear map which assigns a real number to pairs of tangent vectors at each 
tangent space of the manifold M. Let us denote the metric by (-, -) x : T X M x 
T X M -4 M. 

A Riemannian manifold is a smooth manifold endowed with a positive defi- 
nite metric, namely: 

(v, v) x >0,Vv£ T X M. (3) 

dcf — 

The metric (•, -) x also defines the norm \\v\\ x = (v, v) x for v £ T X M . 

The geodesic curve connecting two points x\,x^ £ M is the curve G(t) £ M, 
parameterized by t £ [0 1], that minimizes the energy integral: 

l 

{x,x) x dt, (4) 

under the normalization condition that (G, G)q is constant. By the calculus of 
variation on manifold, the geodesic equation may be written in normal form as: 

x + T x (x,x) = 0. (5) 

In the above expressions, the over-dot and the double over-dot denote first-order 
and second-order derivation with respect to parameter t, respectively, while 
symbol denotes the Christoffel operator. The solution of the geodesic 

equation may be written in terms of two known quantities that serve as bound- 
ary conditions for the second-order geodesic differential equation. The values 
x = x(Q) £ M and v = x(0) £ T X M might be specified, in which case the 
solution of the geodesic equation ([5]) will be denotes as G x>v {t). 

The squared geodesic distance between the geodesic's endpoints is defined 
as: 2 

cP{x u x 2 p(j^{G,G)ldt S j =(G,G) G | t=o . (6) 

Note that if the geodesic curve is expressed as G XiV (t), then the squared geodesic 
distance equals \\v\\ x . 

The gradient V x f £ T X M of a regular criterion function / : M — > R in a 
point x £ M may be defined as: 

v,/ - (#)*, (7) 

where symbol fl denotes the 'sharp' isomorphism and symbol df denotes differ- 
ential. On a Riemannian manifold, the Riemannian gradient may be computed 
by the metric compatibility condition: 

(d x f,vf = (d x f,v) x , Vv £ T X M, (8) 

where symbol (•, -} E denotes Euclidean metric. 
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When a Riemannian manifold of interest M and a regular criterion func- 
tion / : M —> R are specified, a known optimization rule is 'gradient steepest 
descent', that may be expressed as the differential equation on manifold: 

± = -V x /. (9) 

The gradient flow x(t) associated to this system tends toward a local minimum 
of the criterion function /, in fact: 

/ = (V x /,i) a = -||V a! /||S<0, 

with equality holding if and only if V x f = 0, namely, when the flow x(t) ap- 
proaches a stationary point of the criterion /. The optimization system © is 
based on the knowledge that the Riemannian gradient V x f £ T X M points to- 
ward the direction of the maximum growth of the function / around the point 

x e M. 

The basic idea to implement the optimization scheme represented by equa- 
tion © is to replace the continuous-time state- variable x(t) £ M with a discrete- 
time state-variable Xk £ M. This operation requires a numerical scheme of 
integration of the equation ([9]). 

The differential equation (jH]) may be solved numerically by the help of the 
notion of geodesic curve, namely, by the numerical optimization algorithm [8J 

urn]: 

Xk+x = G Xh -v Xk f(rj), (fO) 

where r/ £ [0 1] denotes an integration step-size and G x>v (t) denotes a geodesic 
arc departing from the point x £ M with initial direction v £ T X M and param- 
eter t £ [0 1]. 

A pseudo- Riemannian manifold is a manifold endowed with a metric that is 
not positive definite. (On a pseudo-Riemannian manifold M, the quantity \\v\\ x 
may be positive, negative or null even for ^ v £ T X M .) 

In order to extend the gradient-based optimization algorithm dTUl) to the 
case of pseudo-Riemannian manifold M, it is necessary to compute geodesic 
arcs and gradients on M. 

The basic idea to cope with pseudo-Riemannian manifolds is to decompose 
each tangent space T X M as follows: 

T+M d ={v £ T X M such that \\v\\ 2 x > 0}, 
T°M d ={w € T X M such that = 0}, (11) 
T-M d ={v £ T X M such that \\v\\ 2 x < 0}. 

The notion of geodesies extends to pseudo-Riemannian manifolds by the 
calculus of variation on the energy integral (TJ|. A geodesic arc G x>v (t) will be 
the solution of the differential equation (JSJ) - On a pseudo-Riemannian manifold 
M, however, it holds ||i>||^ < for v £ T~M, therefore the notion of squared 
pseudo-Riemannian geodesic distance may be defined as: 

d 2 (x 1 ,x 2 ) = \\\v\\ x \ 2 . (12) 

Likewise, pseudo-Riemannian gradient defines as V x f — (d/)" again. It is, 
however, worth remarking that the pseudo-Riemannian gradient V x / does not 
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point to the direction of the maximum growth of the function /. Therefore, the 
optimization equation ^ needs to turn into: 

-V x /ifV,/6T+MUT°M, 
V,/ if V x f G T-M, ^1 

whose solution is a minimization flow. In fact, it induces the dynamics: 

/_/ -||V x /iifV x /eT+MUT x °M 1 ( 
f -\\\V x f\\lXV x feT-M J- ' (14) 

The equation (| 1 3[) may be solved numerically by the optimization algorithm: 

r _ / G* k ,-v Xk f(v) if V./ G T+M U T°M, 
+1 ~ I Gx^/WifV./GT-M, ^ 

where symbol G x ^ v (t) denotes a pseudo-Riemannian geodesic arc departing from 
the point x £ M with initial direction v G T^M and parameter f e [0 1] and 
the succession Xk £ M denotes an approximation of the actual solution of the 
differential equation (fl"3|) . 



3 The real symplectic group 

The present section aims at recalling the definition of the real symplectic group 
and its properties, along with some recent results about optimization on it. 

3.1 Definitions and properties 

The real symplectic group is defined as follows: 

Sp(2n,R) d ={z G R 2nx2n \x T q 2n x = q 2n }, (16) 



def 
q2n = 



n c n 



(17) 



where symbol e„ denotes the n x n identity matrix, while symbol 0„ denotes a 
whole-zero n x n matrix. The skew-symmetric matrix qi n enjoys the following 
properties: 

lln = ~ e 2n: (!8) 
^r! = OL = -I2n- (19) 

From now on, the subscript In on the symbol q2 n will drop for a tidier notation. 

The space Sp(2n, R) is a curved smooth manifold that may also be endowed 
with an algebraic-group structure (namely, group multiplication and group in- 
verse and possesses a identity element) in a manner that is compatible with the 
manifold structure. Therefore, the space Sp(2rt,R) has the structure of a Lie 
group (of dimension n(2n + 1)). 

In particular, standard matrix multiplication and inversion work as algebraic- 
group operations. For the matrix multiplication, the property may be proven 
by noting that, for every x,y € Sp(2n,R), the product xy £ Sp(2n, R), in fact: 

(xy) T q(xy) = y T (x T qx)y = y T qy = q. 
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Any symplectic matrix x <E Sp(2n,R) is such that det 2 (x) = 1, where symbol 
dct(-) denotes determinant. Therefore, symplectic matrices are invertible. It 
may be proven that if matrix x 6 Sp(2n,R), then also matrix x^ 1 G Sp(2n,R) 
by negation, namely, by trying to prove that (x _1 ) T qx~ x ^ q. It holds: 

[x^ 1 ) 1 qx^ 1 i= q => —(x T )^ 1 qx^ 1 ^ —q q ^ x T qx, 

that would contradict the hypothesis that x G Sp(2n,R). The identity element 
of the group Sp(2n,R) is clearly the matrix e^n- 
In addition, the following identities hold: 

det(x) = 1, (20) 
x T = -qx^ x q, (21) 
x~ T = -qxq. (22) 

From the identity (I2TJ1) . it follows that the group Sp(2n,R) is a subgroup of 
Sl(2n,R) as well as of Gl(2n,R). The proof of identity $ZU§ is far from trivial 

The tangent space Ta;Sp(2?T,, R) has structure: 

r x Sp(2n,R) = {v G R 2nx2n \v T qx + x T qv = 2 „}. (23) 

The tangent space at the identity of the Lie group, namely the Lie algebra 
sp(2n,R), has structure: 

sp(2n, R) = {he R 2nx2n \h T q + qh = 0} (24) 

and it coincides with the space of 2n x 2n Hamiltonian matrices, in fact. By 
the embedding of the manifold Sp(2n,R) into the Euclidean space R 2nx2 ™, the 
embedded real symplectic group may be endowed with normal spaces as well, 
as: 

7V 3; Sp(2n,R) d = f {n G R 2nx2 "|tr(n T w) = 0,V« G T x Sp(2n, R)}, (25) 

where symbol tr(-) denotes the trace operator. The tangent space, the Lie 
algebra and the normal space associated to the real symplectic group may be 
characterized as follows: 

T x Sp(2n,R) = {xqs\s G R 2nx2n , S T = s }, 
sp(2n,R) = {qs\s G R 2 " x2 «, S T = s }, 
A^Sp(2n,R) = {qxuj\uj G R 2 " x2 ", uF = -u}. 

A noteworthy property of symplectic matrices is as follows. Let a; G Sp(2n, R), 
v G T x Sp(2n,R) and y G R 2nx2 ". The following identity holds: 

tr(a; _1 qyqx^ 1 v) = tr(y T v). (26) 

To prove such identity, use first the parametrization v = xqs, with s = s T £ 
R 2 " x2 ™. Then it holds: 

ti{x~ 1 qyqx~ 1 v) — tr(x~ 1 qyqx~ 1 (xqs)) — ~tr(x~ 1 qys) ~ — tr(sx _1 qy) = 
—tr(y T q T x~ T s T ) — ti{y T q{— qxq)s) = ti(y T xqs) = tr(y T v). 
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Note that the identity (|2"6]l holds for any real- valued matrix y of appropriate 
size. 

A result concerning optimization on the manifold Sp(2n, M) is adapted from 
[3]. Let a : sp(2n,R) — >• sp(2n,R) be a symmetric positive-definite operator 
with respect to the Euclidean inner product (•, -} E on the space sp(2n, R) given 
by tr(hjh 2 ) for every hi, h% £ sp(2n, E). The minimizing curve of the integral: 

(h,cr(h)} E dt 

over all curves x(t) £ Sp(2n, R) with t £ \t\ £2] an d with fixed endpoints x{ti) = 
x\ £ Sp(2n,R) and x(t-i) = X2 £ Sp(2n,R), and where /i is defined by i = as/i, 
so that h G sp(2n, R), is the solution of the system: 

x = xh, 

m = a T {h)m- ma T (h) , (27) 
h = cr _1 (m), 

where symbol cr _1 denotes the inverse of the operator a. 

The simplest choice for the symmetric positive-definite operator a is o{h) — 
h, which corresponds to a Riemannian metric for the real symplectic group 
Sp(2n, R) given by: 

(u, v) x = tiUx^ufix^v)), Vu, v £ T x Sp{2n, R). (28) 

Such a metric leads to the 'natural gradient' on the space of real invertible 
matrices Gl(n, R) studied in pQ. The above choice for the operator u implies 
that m — h and that the corresponding curve on the real symplectic group 
satisfies the equations: 

h = h T h - hh T 

h = 



( 29 ) 



or, in normal form: 

x — xx~ 1 x + xx T qxqx^ 1 x — xx T qxq = 0. (30) 

The above equations describe geodesic arcs on the real symplectic group corre- 
sponding to the metric (|28p . Closed form solutions of the above equations are 
unknown to the authors of [1] and to the present author. 

The Riemannian gradient V x / of a regular function / : Sp(2n,R) — > R 
corresponding to the metric (|28l) may be calculated as the unique vector in 
T!i;Sp(2n, R) that satisfies the following compatibility condition: 

tr (v T d x f) = tr ((x- 1 v) T (x- 1 V x f)) , Vw e T x Sp(2n, R). (31) 

By recalling the structures of the tangent and normal spaces to the real sym- 
plectic group, the solution of the above equation is found as: 

V*/ = xq(u> - x~ 1 qd x f), 

1 -1 1 T 

U3 = -X qd x f + -(d x f) xq, 

from which the expression of the Riemannian gradient associated to the metric 
(|2"g]l follows: 

V,/ - \xq ((d x ffxq - qx T d x f) . (32) 
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3.2 Symplectic group as a pseudo-Riemannian manifold 

Let us consider the following pseudo-Riemannian metric on the general linear 
group of matrices Gl(n) [2"U] : 

(u, v) x d =tr(x~~ 1 ux~ 1 v), Vu, v £ T x Gl(n). (33) 

Under the above pseudo-Riemannian metric, it is indeed possible to solve 
the geodesic equation in closed form. The energy integral in this case reads: 

1 

tT((x~ l x) 2 )dt, (34) 

and the corresponding geodesic curve and the squared geodesic distance have 
the following expressions: 

Gx,v(t) — xexp(te _1 w), (35) 
d 2 (xi,x 2 )=tr(\log 2 (xi 1 x 2 )\). (36) 

On the general linear group Gl(n), matrix logarithm and exponential may be 
defined by the series: 

logy d = f -£&rtf, y€ Gl( n ), (37) 

fc=l 
OO 

expy d = If' y e ( 38 ) 

k=0 

The pseudo-Riemannian gradient of a regular function / : Sp(2n,R) — >• R 
associated to the metric (|33|) is the solution of the compatibility condition: 

tr(5j/u) = tr(x~' 1 V x fx- 1 v), Vw S T x Sp(2n,K). (39) 

that reads: 

V x / = ixg (x T d x fq - q{d x f) T x) . (40) 

It is straightforward to show that the above vector V x f belongs to T x Sp(2n, R). 
It may be shown that it satisfies the condition (|3"9")l by the help of the identity 

H2S). 

Having chosen a pseudo-Riemannian metric for the real symplectic group, 
the expression of the geodesic arc as well as the expression of the geodesic 
distance may be computed in closed form. The least-squares problem: 



f(x)^J2d 2 (x,T k ), (41) 
k 



may be thus set up, where target matrices Tk belong to Sp(2n,R). Moreover, 
the numerical scheme (|15p may be effectively implemented. 

The Euclidean gradient of the criterion function (|4ip has expression: 

a K ^d 2 (x,T fc ) - 2^(log(r- 1 a:) S - 1 ) T sign(tr(log 2 (x" 1 T fc ))) . (42) 
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4 Numerical tests 



The numerical behavior of the developed least-squares algorithm will be exam- 
ined via two different tests. A numerical test concerns the computation of the 
intrinsic mean of a set of given real symplectic matrices. A further numerical 
test concerns the interpolation of two given real symplectic matrices. 

4.1 Computing the intrinsic mean of a set of real symplec- 
tic matrices 

As a numerical test about least squares on the real symplectic group, consider 
the computation of the mean of a set of given real symplectic matrices. (An 
application is described in |18j . Some general remarks on averaging over Lie 
groups may be found in |14j.) 

On a metrizable manifold M, the 'intrinsic mean' may be defined as |15U19| : 

^i d = arg min S~] d 2 (x, r fe ) , (43) 

k 

where the matrices t% s M are distributed around a center-of-mass to be esti- 
mated by /i s M. The minimum value of the criterion function in (I43[) is the 
'intrinsic variance' of the distribution, namely: 

- 2 = ^E d W)- (44) 

k 

Setting M = Sp(2n,R), the problem (|43|) is a least-squares problem on the real 
symplectic group Sp(2n,KL). 

The Figure Q] shows a result obtained with the iterative algorithm (fl"5|) for 
n = 5. The picture shows the value of the criterion function ^ fc d 2 (x, Tfc) 
as well as the value of the squared norm of its pseudo-Riemannian gradient 
during iteration. In the shown example, the squared norm assumes negative as 
well as positive values during optimization. The Figure also shows the squared 
distances d 2 (x,Tk) before iteration (with initial guess chosen as x = eio) and 
after the iteration. The Figure shows that the algorithm converges steadily 
toward the minimal variance (in fact, the distances from the found center of 
mass are much smaller than the distances from the initial guess). 

The group Sp(10,R) is a 55-dimensional manifold whose elements are rep- 
resented by 10 x 10 real-valued matrices. It is impossible to render graphically 
the target matrices Tk, their center of mass and the computed mean matrix. An 
illustration of the distribution of the target matrices around the actual center 
and of the computed mean may, however, be gotten through the application 
of an appropriate dimensionality reduction technique. We chose the 'multidi- 
mensional scaling' (MDS, see Appendix A) as dimensionality reduction method 
onto the real plane R 2 . The result is depicted in Figure From this Figure, it 
is possible to appreciate how close the computed (empirical) intrinsic mean lies 
to the actual center of mass. 

A close-up of the numerical behavior of the least-squares optimization algo- 
rithm (TT5|) comes from the examination of the case n = 1. The group Sp(2, R) is 
a 3-dimensional manifold (in fact, Sp(2,M) = S1(2,R)), therefore the following 
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Figure 1: Optimization over the real symplectic group Sp(10,R). Variance 
during iteration, value of the squared norm of criterion pseudo-Riemannian 
gradient during iteration, the squared distances d 2 (x,Tk) before iteration and 
after iteration. 



parametrization may be taken advantage of: 



3 (a,b,c) -S> 



a b 
c d 



GSp(2,R). (45) 



(In fact, the fourth parameter d is constrained to the three free parameters a, 
b and c by the symplecticity condition ad — be — 1.) Hence, the elements of the 
group Sp(2,R) may be rendered on a 3-dimensional figure. 

The Figure [3] shows a result obtained with the iterative algorithm (1T51 for 
n = 1. The Figure shows the location of the target matrices Tk (circles), the 
location of the center-of-mass (cross), the trajectory of the optimization algo- 
rithm over the search space (solid-dotted line) and the location of the final point 
computed by the algorithm (diamond) . The Figure shows that the algorithm is 
convergent toward the center of mass. The Figure 0] shows the variance during 
iteration, the value of the squared norm of criterion pseudo-Riemannian gradi- 
ent during iteration, the squared distances d 2 (x,Tk) before iteration and after 
the iteration. 
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Figure 2: Optimization over the real symplectic group Sp(10,R): Visualization 
via MDS of the cloud of points representing target matrices (open circles), the 
center of mass (cross) and the computed mean (open diamond). 



4.2 Interpolating over the real symplectic group 

As a further numerical test, consider the interpolation of two given real sym- 
plectic matrices. (An application is described in [2"2"].) 

On a metrizable manifold M, the continuous interpolate between two given 
matrices T\,T2 € M may be defined as the curve x : [0 1] — > M such that: 



2(t) d =arg min [(1 - t)d J (z(t), n) + ttC(z(t), r 2 )], 
z(t)ec 



(46) 



where C denotes the set of smooth curves that perform the mapping [0 1] — > M. 
Setting M = Sp(2n,R), the problem (|4"6"|) is a least-squares problem on the real 
symplectic group Sp(2n,R). 

The solution of the above least-squares problem may be given in closed form. 
It coincides to a geodesic arc G XiV (t) parameterized so that G X}V {0) — t\ and 
Gx,v(l) = t~2, namely: 



G(t)=riexp(tlog(Tf 1 T 2 )). 



(47) 



The Figure [S] shows a result of continuous interpolation for n — 1 (the 
parametrization (|4"5j) has been used again). The Figure shows the location of 
the endpoint matrices t\,t% G Sp(2,R) (dotted-circles) and of the interpolates 
(dots). 
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Figure 3: Optimization over the real symplectic group Sp(2, R). Target matrices 
Tfc are denoted by circles. Center-of-mass is denoted by a cross mark. Trajectory 
of the optimization algorithm is denoted by a solid-dotted line. Last point of 
the trajectory is denoted by a diamond mark. 



5 Generalizations and applications 

The present manuscript focuses on least-squares problems on the real symplectic 
group. The developed least-squares theory applies, however, to general non- 
compact manifolds. 

Least-squares problems arise in a series of applications in signal processing, 
machine learning, pattern recognition and computational statistics. Here follows 
a non-exhaustive list of possible applications of least-squares methods on non- 
compact manifolds: 

• Computation of the empirical mean value on manifold: Given a 'cloud' of 
points Tfe e M on a manifold M endowed with a distance function d(-, •), 
its center of mass \i is defined as: 

/x^arg min d 2 (a;, Tfe). (48) 

k 

The empirical mean value \x of a distribution of points on a manifold is 
instrumental in several applications. The mean value \x is, by definition, 
close to all points in the distribution, therefore, the tangent space T^M 
may serve as reference tangent space in the development of algorithms on 
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Figure 4: Optimization over the real symplectic group Sp(2, R). Variance during 
iteration, value of the squared norm of criterion pseudo-Riemannian gradient 
during iteration, the squared distances d (x, rfe) before iteration and after iter- 
ation. 



the manifold M (likewise the Lie algebra associated to a Lie group serves 
as reference tangent space in Lie-group theory). 

• Computation of the empirical high- order moments on manifold: Given a 
cloud of N points e M on a manifold M endowed with a distance 
function d(-, •), its scalar m th -order (centered) moment is defined as: 

A^^E^ 7 "^ ™> 2 ' (49) 

k 

where fi G M denotes the empirical mean of the cloud. The moment \xi 
denotes the variance a 2 , which measures the width of the cloud around 
its center. 

• Applications based on the statistical distributions of data-points: Some 
statistical techniques, such as maximum likelihood estimation, are based 
on assumptions about the statistical distributions of the data-points. In 
the present context, it could be worth considering the distribution of the 
distances of the points from their center, namely, by defining the fol- 
lowing exemplary distributions: Gaussian p(r) ~ exp ( — - g^r^ ) , Lapla- 
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Figure 5: Interpolation over the real symplectic group Sp(2,R). Endpoint ma- 
trices T\,T2 are denoted by dotted circles. Computed interpolates are denoted 
by dots. 



cian p(r) ~ exp (— Xd(r, fi))) and Rayleigh p(r) ~ d{r, fi) exp y 2a^~) > 

where r e M denotes the manifold- valued variable of interest, \i e M de- 
notes the center of mass of the distribution, a 2 denotes the variance and 
A > denotes a dispersion parameter. 

• Projection of a point on a curve: Given a point r £ M and a smooth 
curve c x . v : [—a a] — > M such that c Xi „(0) = x, the 'foot' of the projection 
of the point r on the curve c X; „ is defined as the point c x ^ v {<p), with: 



It is assumed that the projection is well-defined, namely, that the foot- 
parameter value (f> is unique. The ( scalar) projection of the point r on the 
curve c X; „ is defined as: 



The projection computes as the (signed) distance (measured along the 
curve c XyV ) between the foot of the projection and the origin of the curve 




<p = arg mm a (r, c 

t£[— a a] 



(*))• 



(50) 




(51) 
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x G M. The projection represents, therefore, the analogous of the orthog- 
onal projection on Euclidean spaces. In the special case that the curve 
c XjV is a geodesic arc on the manifold M endowed with the metric (•, •)., 
then it holds: 

<,,„ = 4>VM~ X - (52) 

Moreover, the quantity d(r, c XiV (<p)) denotes the distance of the point r G 
M from the curve c Xi „. 

Interpolation and modeling: Given a set of points T k G M on a manifold 
M endowed with a distance function d(-, •) and a curve c X:V : [—a a] — >• M, 
interpolation/modeling is about finding parameters: 

(x,v)=arg min d 2 (r fe , c x ,„(^ fc )), (53) 

k 

with projection- feet defined as: 

fc d =arg min d 2 (T k ,c XyV (t)). (54) 

t£[— a a] 

The curve c SjC (i) is an interpolator with optimal parameters. 

Principal geodesic analysis: First principal component analysis and one- 
unit independent component analysis may be extended to manifolds as 
follows. Let it be given a cloud of points T k G M on a manifold M endowed 
with a geodesic distance function d(-, •) associated to a geodesic curve G x _ v . 
Let \i G M denote the empirical mean of the cloud. The principal geodesic 
arc that captures the largest data-power may be defined as G MjC with: 

v = arg max ^- d 2 (/i, G^ v {(j> k )), under (v, u) M = 1, (55) 

veT^M iV 

fe 

where each <j>k denotes the foot-parameter associated to the projection of 
Tfc over the geodesic arc. (The normalization condition plays the same role 
as in linear principal component analysis). As the geodesic distance are 
computed over geodesic arcs, the above expression simplifies as: 

v — arg max > d>i. (56) 

k 

Likewise, the principal geodesic arc that corresponds to the largest m th 
moment finds by: 

v = arg max 6T . (57) 

k 

The principal geodesic arc that maximizes the projection variance may 
be used to 'skeletonize' data on manifolds and for data compression. In 
fact, a lossy representation of a data r k is given by its foot-parameter 
<j>k ■ A principal geodesic arc that maximizes a high-order moment of the 
projection may be used in binary discriminant analysis to discern between 
data that possess or not possess certain high-order statistical features. 
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• Fisher discriminant analysis on manifold: Let data-points 6 M be- 
longing to two classes C\ and Ci of cardinality N\ and A 2 , respectively, 
be given. Let be the intrinsic mean of data-points in class C\ and /i2 
be the intrinsic mean of data-points in class C<i. Let G XyV be a geodesic 
curve on the manifold M and d(-,-) be the associated geodesic distance 
function on M. Let <pi and 02 denote the foot-parameters associated to 
the projection of fjb\ over the geodesic arc and of fi2 over the geodesic 
arc, respectively, and let 4>k denote the foot-parameter associated to the 
projection of over the geodesic arc. The Fisher ratio associated to the 
2-class maximum discrimination problem is extended from the Euclidean 

case as: F(x, v) = 
(P(G X ,M,G X ,M 

E feeCl <P{G x , v {ct> k ), g x ,M) + £ £ fceC2 d 2 (G x ,M, G XlV fa)) ■ 

(58) 

The numerator of the above expression represents the between-class vari- 
ance which amounts to the squared difference between the means projected 
on a properly oriented curve. The denominator represents the within-class 
variance which amounts at the variance of the projected elements of the 
first class and of the second class. Note that all geodesic distances are 
measured on a geodesic arc, therefore, the above expression simplifies con- 
siderably because, for instance, d 2 (G x , v ((f>i), G XiV ((f>2) — (<^i — 4>2) 2 {v , v) x . 
Hence: 

F = ih^M. (59) 

• Kalman filtering on manifold: The state of a dynamical system evolving 
on a manifold may be estimated by setting up an appropriate least-squares 
problem. The extension from the case of Kalman filtering on Euclidean 
spaces is straightforward conceptually. An example of system model on a 
manifold M is: 

Tn+l = Gj n ,x >„(!), (6Q) 
v n = S{T n )+r n , 

where n denotes discrete time, r„ e M denotes system state, G. : . denotes 
a geodesic curve on the manifold M, S(-) denotes a system operator which 
maps t 6 M 4 S(t) G T t M and r n G T Tn is a random noise term. (A map 
that sends a point from a manifold to its tangent space and that satisfies 
certain regularity conditions is termed lifting map.) More generally, the 
problem of Bayesian filtering on manifold is connected to the problem of 
extending standard state-space models from Euclidean spaces to manifolds 

Least squares problems on manifolds may generalize the quadratic assign- 
ment problem [111 that is encountered in the allocation of facilities while 
optimizing the distance between locations in combination with the flow 
between the facilities. Also, the traveling salesman problem and the plant 
location problem are special cases of the quadratic allocation problem [TT] . 
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6 Conclusion 



The present paper discusses the problem of least squares over the real symplectic 
group of matrices. The present research takes its moves from the following 
observations: 

• The least-squares problem may be extended from fiat spaces to curved 
smooth metrizable spaces by the help of the notion of 'geodesic distance'. 

• The resulting sum-of-squared-distance minimization problem on manifold 
may be tackled via a gradient-based descent algorithm tailored to the ge- 
ometry of the symplectic group through a geodesic-based stepping method. 

As the real symplectic group is a non-compact manifold, it might be hard to 
compute closed-forms quantities in a Riemannian context. Indeed, known re- 
sults from scientific literature show that it is the case. 

The key point of the present paper is to regard the real symplectic group as a 
pseudo- Riemannian manifold and chose a metric that allows for the computation 
of closed-forms of geodesic arcs and hence of geodesic distance. On the basis of 
these findings, the geodesic least-squared problem may be properly set up and 
the geodesic-based numerical stepping method may be properly implemented. 

Numerical tests have been performed with reference to the computation of 
the intrinsic mean of a collection of symplectic matrices as well as to the inter- 
polation of two symplectic matrices. Numerical results show that the pseudo- 
Riemannian-gradient-based algorithm, along with a pseudo-geodesic-based step- 
ping method, is suitable to the numerical solution of a least-squares problem 
formulated in terms of pseudo-geodesic distance. 

A Appendix: Metric Multidimensional Scaling 

One of the purposes of multidimensional scaling (MDS) [21 is to provide a visual 
representation of the pattern of proximities among a set of high-dimensional 
objects. In this instance, MDS finds a set of vectors in the two-dimensional or 
three-dimensional Euclidean space such that the matrix of Euclidean distances 
among them corresponds - as closely as possible - to some function of the 
objects' proximity matrix according to a criterion function termed stress. 

The following description provides a short introduction to MDS, as reworded 
in terms of low-dimensional representation of pseudo- Riemannian-manifold- valued 
elements. 

Let M be a high-dimensional pseudo-Riemannian manifold with distance 
function d(-, ■) and let {rk\k be a given collection of elements of M, The aim 
of MDS is to determine a collection of vectors Zk £ R p (with p — 2 or p = 3) 
that replicate the pattern of proximities among the elements This may be 
achieved by minimizing the Kruskal stress function: 

EE(ii* - - d ( r »> T *)) 2 ' ( 61 ) 

i k^i 

where symbol || • || denotes Euclidean norm. The above stress function gives rise 
to 'metric MDS'. More elaborated stress functions are available in the specific 
literature. 
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It is worth noting that the correspondence Tfc <H> Zk induced by MDS is 
not unique. In fact, if {zk}k is a minimizer of the criterion (|6ip . c G M p and 
R G 0(p) (namely, a p-dimensional rotation/reflection), also {Rzk + c}k is a 
minimizer. 

MDS may be used as a proximity /similarity visualization tool for high- 
dimensional data as it computes two-dimensional or three-dimensional vectors 
Zk, corresponding to the original elements Tfc, that captures the fundamental 
information about mutual distances. 

The axes corresponding to the coordinates of the vectors Zk, possess, in 
themselves, no particular meaning. Also, the orientation and scaling of the 
obtained visualization are arbitrary. All that matters in an MDS map is which 
point is close to which others. 

On a MDS visualization corresponding to a non-zero stress, the distances 
among objects are imperfect representations of the relationships among original 
data: The greater the stress, the greater the distortion. In general, however, 
the larger distances are represented more accurately, because the Kruskal stress 
function accentuates discrepancies in the larger distances. 
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